Lab5: Multiple Logisitic Regression & Model Selectionn

General introduction to the biological rationale of this lab was given in the Lab 2-4 lab introductions. In Lab 5, we extend the analysis of habitat use by wolves in two wolf packs from 2001-2005 in Banff National Park to multivariate analysis of habitat selection using multiple logistic regression to estimate a resource selection function (RSF) as a function of the availability of resources.
First, we will deal with the problem of teasing apart correlated independent variables to use in multiple logistic regression models for RSF models. Second, we will use our biological knowledge of the system and the correlations between variables to develop an a-priori set of candidate models that minimize problems from confounded variables following the philosophy of the information theoretic approach to data analysis ( Burnham and Anderson 1998, Anderson et al. 2000).
0.1 Preliminaries: setting packages
0.2 Preliminaries: setting working directories and importing data
### Define a text string that defines where your folder is

laptop="/Users/mark.hebblewhite/Dropbox/WILD 562/Spring2017/lab5/"
#desktop = "/Users/mark.hebblewhite/Documents/Teaching/UofMcourses/WILD562/Spring2017/Labs/lab5"
setwd(laptop)
getwd()
## [1] "/Users/mark.hebblewhite/Dropbox/WILD 562/Spring2017/lab5"
wolfkde <- read.csv("wolfkde.csv", header=TRUE, sep = ",", na.strings="NA", dec=".")
head(wolfkde)
##   deer_w2 moose_w2 elk_w2 sheep_w2 goat_w2 wolf_w2 Elevation2
## 1       4        5      5        3       3       5   1766.146
## 2       4        4      4        1       3       4   1788.780
## 3       4        5      5        4       1       5   1765.100
## 4       4        5      5        4       1       5   1742.913
## 5      NA       NA     NA       NA      NA      NA   1987.394
## 6       1        1      1        1       4       1   1778.360
##   DistFromHumanAccess2 DistFromHighHumanAccess2 landcover16 EASTING
## 1            427.39618                9367.8168           8  580840
## 2            360.50430               10398.5999           2  580000
## 3            283.66480               10296.5167           2  579800
## 4            167.41344                6347.8193           2  583803
## 5             27.90951                8853.8623           2  573900
## 6            622.62573                 723.7941          13  588573
##   NORTHING     pack used usedFactor      habitatType        landcov.f
## 1  5724800 Red Deer    1          1            Shrub            Shrub
## 2  5724195 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 3  5724800 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 4  5725654 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 5  5741316 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 6  5728804 Red Deer    1          1   Burn-Grassland             Burn
##   closedConif modConif openConif decid regen mixed herb shrub water
## 1           0        0         0     0     0     0    0     1     0
## 2           0        1         0     0     0     0    0     0     0
## 3           0        1         0     0     0     0    0     0     0
## 4           0        1         0     0     0     0    0     0     0
## 5           0        1         0     0     0     0    0     0     0
## 6           0        0         0     0     0     0    0     0     0
##   rockIce burn alpineHerb alpineShrub alpine
## 1       0    0          0           0      0
## 2       0    0          0           0      0
## 3       0    0          0           0      0
## 4       0    0          0           0      0
## 5       0    0          0           0      0
## 6       0    1          0           0      0
Objective 1.0 Multiple Logistic Regression & Collinearity
We will first evaluate collinearity between Distance to High Human Use and Elevation
## First lets fit Univariate models of these 2 covariates
elev <- glm(used~Elevation2, data =wolfkde, family= binomial(logit))
disthhacc <-  glm(used~DistFromHighHumanAccess2, data =wolfkde, family= binomial(logit))

# Next, fit both in our first multiple logistic regression model
elev.disthhacc <- glm(used~Elevation2 +DistFromHighHumanAccess2 , data =wolfkde, family= binomial(logit))
summary(elev.disthhacc)
## 
## Call:
## glm(formula = used ~ Elevation2 + DistFromHighHumanAccess2, family = binomial(logit), 
##     data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3850  -0.5246  -0.1744  -0.0467   3.2732  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               1.019e+01  6.515e-01  15.641  < 2e-16 ***
## Elevation2               -7.079e-03  4.257e-04 -16.629  < 2e-16 ***
## DistFromHighHumanAccess2  2.317e-04  3.063e-05   7.566 3.86e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2180.1  on 2369  degrees of freedom
## Residual deviance: 1476.9  on 2367  degrees of freedom
##   (39 observations deleted due to missingness)
## AIC: 1482.9
## 
## Number of Fisher Scoring iterations: 7
# now lets extract coefficients

summary(elev)$coefficients[,1:2]
##                 Estimate   Std. Error
## (Intercept)  7.866205928 0.4888041831
## Elevation2  -0.005455301 0.0003038507
summary(disthhacc)$coefficients[,1:2]
##                               Estimate   Std. Error
## (Intercept)              -1.1458898955 6.659037e-02
## DistFromHighHumanAccess2 -0.0002289669 2.828194e-05
summary(elev.disthhacc)$coefficients[,1:2]
##                             Estimate   Std. Error
## (Intercept)              10.18959176 6.514719e-01
## Elevation2               -0.00707877 4.256818e-04
## DistFromHighHumanAccess2  0.00023174 3.063027e-05
### what just happened to the coefficient for Distance to High Human Access?

## lets visually explore differences
disthumanBnp = 0:7000
prDisthhacc <- predict(disthhacc, newdata=data.frame(DistFromHighHumanAccess2=disthumanBnp), type="response")
head(prDisthhacc)
##         1         2         3         4         5         6 
## 0.2412406 0.2411987 0.2411568 0.2411149 0.2410730 0.2410311
plot(wolfkde$DistFromHighHumanAccess2, wolfkde$used)
lines(disthumanBnp, prDisthhacc, type="l", ylab= "Pr(Used)")

Now lets do for the Multiple Logistic regression model but now we have 2 sets of covariates to consider, elevation and distance from high human access lets determine the ‘median’ elevation
summary(wolfkde$Elevation2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1401    1578    1931    1944    2246    3112
## ok - lets evaluate the probability of use at 1931 meters from the elev.disthhacc model
meanElev = 1931
prElevMedian.Disthhacc <- predict(elev.disthhacc, newdata=data.frame(DistFromHighHumanAccess2=disthumanBnp, Elevation2=meanElev), type="response")
plot(wolfkde$DistFromHighHumanAccess2, wolfkde$used, xlim=(c(0,10000)))
lines(disthumanBnp, prElevMedian.Disthhacc, type="l", ylab= "Pr(Used)")
lines(disthumanBnp, prDisthhacc, type="l", ylab= "Pr(Used)")

What is going on?? how did the coefficient switch sign?
Partial Regression Coefficients - in multiple linear or logistic regression the B’s now change interpretation to the effects of X2 on Y while holding effects of X1 constant at their mean.
Previous plot was only plotted for 1 level of Elevation at the median elevation of 1900m lets create a new data framew with elevations and distance to high human access varying in 10 levels
## using the pretty() function 
#? pretty
newdata <- expand.grid(Elevation2 = pretty(wolfkde$Elevation2, 5), DistFromHighHumanAccess2 = pretty(wolfkde$DistFromHighHumanAccess2, 10))
head(newdata)
##   Elevation2 DistFromHighHumanAccess2
## 1       1000                        0
## 2       1500                        0
## 3       2000                        0
## 4       2500                        0
## 5       3000                        0
## 6       3500                        0
newdata$prElev.Disthha <-predict(elev.disthhacc, newdata, type="response")

ggplot(newdata, aes(x = DistFromHighHumanAccess2, y = prElev.Disthha)) + geom_line() + facet_wrap(~Elevation2)

## Can we hold effects of X1 constant while varying X1?
Why are Elevation and DistHighHumanUse changing?
They are CORRELATED!
cor.test(wolfkde$Elevation2, wolfkde$DistFromHighHumanAccess2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$Elevation2 and wolfkde$DistFromHighHumanAccess2
## t = 30.404, df = 2368, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5002857 0.5582300
## sample estimates:
##       cor 
## 0.5298759
elev_disthha <- lm(DistFromHighHumanAccess2~Elevation2, data=wolfkde)
summary(elev_disthha)
## 
## Call:
## lm(formula = DistFromHighHumanAccess2 ~ Elevation2, data = wolfkde)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6853.3 -1763.8  -388.8   215.3 12510.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6305.7051   297.7015  -21.18   <2e-16 ***
## Elevation2      4.5579     0.1499   30.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2917 on 2368 degrees of freedom
##   (39 observations deleted due to missingness)
## Multiple R-squared:  0.2808, Adjusted R-squared:  0.2805 
## F-statistic: 924.4 on 1 and 2368 DF,  p-value: < 2.2e-16
cor.test(wolfkde$Elevation2, wolfkde$DistFromHighHumanAccess2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$Elevation2 and wolfkde$DistFromHighHumanAccess2
## t = 30.404, df = 2368, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5002857 0.5582300
## sample estimates:
##       cor 
## 0.5298759
plot(wolfkde$Elevation2,wolfkde$DistFromHighHumanAccess2, type="p")
abline(lm(DistFromHighHumanAccess2~Elevation2, data=wolfkde), col="red")

## Can we hold effects of X1 constant while varying X1?
## showing it both ways

pairs(~Elevation2+DistFromHighHumanAccess2, data=wolfkde, main="Scatterplot Matrix")

Elevation2 and distance to High Human Access are correlated AND confounded; i.e., there are no areas far from high human access at LOW elevations
## lets test 2 other variables, elk and deer...
deer <- glm(used~deer_w2, data =wolfkde, family= binomial(logit))
elk <-  glm(used~elk_w2, data =wolfkde, family= binomial(logit))

# Next, fit both in our first multiple logistic regression model
deer.elk <- glm(used~deer_w2 + elk_w2, data =wolfkde, family= binomial(logit))

# now lets extract coefficients
summary(deer)$coefficients[,1:2]
##              Estimate Std. Error
## (Intercept) -5.932341 0.30658301
## deer_w2      1.117935 0.06934898
summary(elk)$coefficients[,1:2]
##              Estimate Std. Error
## (Intercept) -5.883763 0.30283606
## elk_w2       1.122130 0.06991732
summary(deer.elk)$coefficients[,1:2]
##               Estimate Std. Error
## (Intercept) -6.9775171 0.36903412
## deer_w2      0.7469043 0.08686179
## elk_w2       0.6337477 0.08998392
## note this time the sign's didnt flip, but, they significantly changed, weakening in the presence of each other. This is because they are so correlated
cor.test(wolfkde$deer_w2, wolfkde$elk_w2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$deer_w2 and wolfkde$elk_w2
## t = 78.712, df = 2155, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8500532 0.8718655
## sample estimates:
##       cor 
## 0.8613558
Objective 2.0 Screening for Collinearity
2.1 Continuous variables
### Plotting pairwise correlations one at a time.
plot(wolfkde$Elevation2 ,wolfkde$goat_w2, type="p")
abline(lm(goat_w2~Elevation2, data=wolfkde), col="red")

## graphically examining collinearity

# using different methods pairs() in base package
pairs(~deer_w2+moose_w2+elk_w2+sheep_w2+goat_w2+Elevation2, data=wolfkde, main="Scatterplot Matrix")

pairs(~Elevation2+DistFromHumanAccess2+DistFromHighHumanAccess2, data=wolfkde, main="Scatterplot Matrix")

## using car library
scatterplotMatrix(~deer_w2+moose_w2+elk_w2+sheep_w2+goat_w2+Elevation2, data=wolfkde, main="Scatterplot Matrix")

scatterplotMatrix(~Elevation2+DistFromHumanAccess2+DistFromHighHumanAccess2, data=wolfkde, main="Scatterplot Matrix")

scatterplotMatrix(~deer_w2+moose_w2+elk_w2+sheep_w2+goat_w2+Elevation2+DistFromHumanAccess2+DistFromHighHumanAccess2, data=wolfkde, main="Scatterplot Matrix")

## using corrgram
corrgram(wolfkde[1:9], order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Correlations in the Wolf Data")

corrgram(wolfkde[1:9], order=TRUE, lower.panel=panel.pts,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Correlations in the Wolf Data")

corrgram(wolfkde[1:9], order=TRUE, lower.panel=panel.ellipse,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Correlations in the Wolf Data")

Note there are a billion ways to explore scatter plot matrices
and in R Graphics Cookbook Section 5.13.
## using the ggcorr package
##ggcorrplot <- ggcorr(wolfkde[1:9], label = TRUE)
## note I supressed running the actual code here because it generates a lot of errors
ggcorrplot

## GGally package with ggpairs()
## note I supressed running the actual code here because it generates a lot of errors
ggpairplot<-ggpairs(wolfkde[1:9])
ggpairplot

Function for calculating correlations and probabilities,
Correlations appear below the diagonal and significance probabilities above the diagonal
cor.prob <- function(X, dfr = nrow(X) - 2) {
  R <- cor(X, use="complete.obs")
  above <- row(R) < col(R)
  r2 <- R[above]^2
  Fstat <- r2 * dfr / (1 - r2)
  R[above] <- 1 - pf(Fstat, 1, dfr)
  R
}

cor.prob(as.matrix(wolfkde[,c("deer_w2","elk_w2", "moose_w2", "sheep_w2", "goat_w2", "Elevation2", "DistFromHumanAccess2", "DistFromHighHumanAccess2")]))
##                             deer_w2     elk_w2   moose_w2    sheep_w2
## deer_w2                   1.0000000  0.0000000  0.0000000  0.00000000
## elk_w2                    0.8626931  1.0000000  0.0000000  0.00000000
## moose_w2                  0.6744718  0.7261029  1.0000000  0.00000000
## sheep_w2                  0.2342658  0.3433162  0.1775864  1.00000000
## goat_w2                  -0.3080804 -0.2191919 -0.3223450  0.41414693
## Elevation2               -0.7884382 -0.7562834 -0.7246609 -0.01123869
## DistFromHumanAccess2     -0.5299680 -0.5462672 -0.5104927 -0.08041229
## DistFromHighHumanAccess2 -0.3823739 -0.2771004 -0.3593654  0.08629733
##                            goat_w2 Elevation2 DistFromHumanAccess2
## deer_w2                  0.0000000  0.0000000         0.000000e+00
## elk_w2                   0.0000000  0.0000000         0.000000e+00
## moose_w2                 0.0000000  0.0000000         0.000000e+00
## sheep_w2                 0.0000000  0.5813983         7.778882e-05
## goat_w2                  1.0000000  0.0000000         0.000000e+00
## Elevation2               0.4949585  1.0000000         0.000000e+00
## DistFromHumanAccess2     0.2787289  0.6622517         1.000000e+00
## DistFromHighHumanAccess2 0.2867019  0.5298149         4.295220e-01
##                          DistFromHighHumanAccess2
## deer_w2                              0.000000e+00
## elk_w2                               0.000000e+00
## moose_w2                             0.000000e+00
## sheep_w2                             2.222056e-05
## goat_w2                              0.000000e+00
## Elevation2                           0.000000e+00
## DistFromHumanAccess2                 0.000000e+00
## DistFromHighHumanAccess2             1.000000e+00
Now lets modify the function to add *’s for P=0.05 significant correlations try this
cor.prob2 <- function(X, dfr = nrow(X) - 2) {
  R <- cor(X, use="complete.obs")
  above <- row(R) < col(R)
  r2 <- R[above]^2
  Fstat <- r2 * dfr / (1 - r2)
  R[above] <- 1 - pf(Fstat, 1, dfr)
  Rstar = ifelse(R[above]<0.05, "***", "NS")
  R[above]=paste(R[above],Rstar)
  R
}

cor.prob2(as.matrix(wolfkde[,c("deer_w2","elk_w2", "moose_w2", "sheep_w2", "goat_w2", "Elevation2", "DistFromHumanAccess2", "DistFromHighHumanAccess2")]))
##                          deer_w2              elk_w2              
## deer_w2                  "1"                  "0 ***"             
## elk_w2                   "0.862693127598712"  "1"                 
## moose_w2                 "0.674471775950577"  "0.726102871977064" 
## sheep_w2                 "0.234265793522001"  "0.343316157179214" 
## goat_w2                  "-0.308080392316883" "-0.219191863521459"
## Elevation2               "-0.788438190137434" "-0.756283409908158"
## DistFromHumanAccess2     "-0.529967964874404" "-0.546267213934725"
## DistFromHighHumanAccess2 "-0.3823738562015"   "-0.277100413472897"
##                          moose_w2             sheep_w2             
## deer_w2                  "0 ***"              "0 ***"              
## elk_w2                   "0 ***"              "0 ***"              
## moose_w2                 "1"                  "0 ***"              
## sheep_w2                 "0.177586400300924"  "1"                  
## goat_w2                  "-0.322344982053384" "0.414146928729061"  
## Elevation2               "-0.724660907098854" "-0.0112386851028276"
## DistFromHumanAccess2     "-0.510492710064905" "-0.0804122947181291"
## DistFromHighHumanAccess2 "-0.359365385166628" "0.0862973269530151" 
##                          goat_w2             Elevation2            
## deer_w2                  "0 ***"             "0 ***"               
## elk_w2                   "0 ***"             "0 ***"               
## moose_w2                 "0 ***"             "0 ***"               
## sheep_w2                 "0 ***"             "0.581398314050348 NS"
## goat_w2                  "1"                 "0 ***"               
## Elevation2               "0.494958474815303" "1"                   
## DistFromHumanAccess2     "0.278728941569084" "0.662251706406083"   
## DistFromHighHumanAccess2 "0.286701871855566" "0.529814916784233"   
##                          DistFromHumanAccess2      
## deer_w2                  "0 ***"                   
## elk_w2                   "0 ***"                   
## moose_w2                 "0 ***"                   
## sheep_w2                 "7.77888244085645e-05 ***"
## goat_w2                  "0 ***"                   
## Elevation2               "0 ***"                   
## DistFromHumanAccess2     "1"                       
## DistFromHighHumanAccess2 "0.429522031098444"       
##                          DistFromHighHumanAccess2  
## deer_w2                  "0 ***"                   
## elk_w2                   "0 ***"                   
## moose_w2                 "0 ***"                   
## sheep_w2                 "2.22205557801614e-05 ***"
## goat_w2                  "0 ***"                   
## Elevation2               "0 ***"                   
## DistFromHumanAccess2     "0 ***"                   
## DistFromHighHumanAccess2 "1"
So which covariates have the highest correlations?? Deer, Elk, and Moose all have correlation coefficients > 0.65; Sheep and Goats are correlated > 0.4; elevation is inversely correlated with an R of -0.75 with deer, elk , moose
Screening for multicollinearity using Variance Inflation Factors
The square root of the variance inflation factor indicates how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other predictor variables in the model. If the variance inflation factor of a predictor variable were 5.27 (√5.27 = 2.3) this means that the standard error for the coefficient of that predictor variable is 2.3 times as large as it would be if that predictor variable were uncorrelated with the other predictor variables. The definition of ‘high’ is somewhat arbitrary but values in the range of 5-10 are commonly used. So in this case, we are really concerned with Elevation2
full.model = glm(used~deer_w2 + elk_w2 +moose_w2 +sheep_w2+goat_w2+Elevation2+DistFromHumanAccess2+DistFromHighHumanAccess2, data =wolfkde, family= binomial(logit))
summary(full.model)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2, 
##     family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9200  -0.4636  -0.1536  -0.0378   3.2277  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               6.418e+00  1.349e+00   4.759 1.95e-06 ***
## deer_w2                   1.859e-01  1.169e-01   1.590 0.111758    
## elk_w2                    4.677e-01  1.249e-01   3.744 0.000181 ***
## moose_w2                 -4.129e-01  9.038e-02  -4.568 4.91e-06 ***
## sheep_w2                 -1.015e-01  5.464e-02  -1.857 0.063244 .  
## goat_w2                  -1.800e-01  5.987e-02  -3.006 0.002647 ** 
## Elevation2               -4.746e-03  6.207e-04  -7.646 2.08e-14 ***
## DistFromHumanAccess2     -6.667e-04  1.988e-04  -3.354 0.000796 ***
## DistFromHighHumanAccess2  1.850e-04  3.443e-05   5.372 7.78e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1271.7  on 2109  degrees of freedom
##   (291 observations deleted due to missingness)
## AIC: 1289.7
## 
## Number of Fisher Scoring iterations: 7
#?vif()
vif(full.model)
##                  deer_w2                   elk_w2                 moose_w2 
##                 2.246567                 2.300770                 1.466803 
##                 sheep_w2                  goat_w2               Elevation2 
##                 1.244975                 1.398556                 3.696289 
##     DistFromHumanAccess2 DistFromHighHumanAccess2 
##                 1.280759                 2.069548
## But in the final model, sheep nor deer are significant any more, but they probably shouldnt have been in the model in the first place
2.1 Continuous variables
Assesing collinearity of continuous and categorical variables
cor.test(wolfkde$alpine, wolfkde$Elevation2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$alpine and wolfkde$Elevation2
## t = 9.133, df = 2407, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1441284 0.2213302
## sample estimates:
##       cor 
## 0.1830114
cor.test(wolfkde$burn, wolfkde$Elevation2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$burn and wolfkde$Elevation2
## t = -4.1959, df = 2407, p-value = 2.817e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.12472393 -0.04543012
## sample estimates:
##         cor 
## -0.08521194
cor.test(wolfkde$closedConif, wolfkde$Elevation2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$closedConif and wolfkde$Elevation2
## t = -8.863, df = 2407, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2161762 -0.1388237
## sample estimates:
##        cor 
## -0.1777745
cor.test(wolfkde$herb, wolfkde$Elevation2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$herb and wolfkde$Elevation2
## t = -4.0152, df = 2407, p-value = 6.121e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.12111019 -0.04176791
## sample estimates:
##         cor 
## -0.08156828
cor.test(wolfkde$goat_w2, wolfkde$Elevation2)
## 
##  Pearson's product-moment correlation
## 
## data:  wolfkde$goat_w2 and wolfkde$Elevation2
## t = 26.267, df = 2155, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4598180 0.5237848
## sample estimates:
##       cor 
## 0.4924662
## probably shouldnt consider goat and elevation in the same model

cor.prob(as.matrix(wolfkde[,c("Elevation2", "openConif", "closedConif", "modConif", "burn", "herb", "decid", "burn", "alpine")]))
##              Elevation2     openConif   closedConif      modConif
## Elevation2   1.00000000  9.648789e-05  0.000000e+00  0.000000e+00
## openConif   -0.07936032  1.000000e+00  3.749802e-09  8.659740e-15
## closedConif -0.17777451 -1.197334e-01  1.000000e+00  0.000000e+00
## modConif    -0.39994128 -1.571507e-01 -3.325652e-01  1.000000e+00
## burn        -0.08521194 -4.493261e-02 -9.508718e-02 -1.248025e-01
## herb        -0.08156828 -3.669530e-02 -7.765525e-02 -1.019230e-01
## decid       -0.04807643 -8.399245e-03 -1.777463e-02 -2.332931e-02
## burn.1      -0.08521194 -4.493261e-02 -9.508718e-02 -1.248025e-01
## alpine       0.18301143 -4.712935e-02 -9.973596e-02 -1.309040e-01
##                      burn          herb        decid        burn.1
## Elevation2   2.816924e-05  6.120894e-05  0.018284488  2.816924e-05
## openConif    2.743081e-02  7.174472e-02  0.680309085  2.743081e-02
## closedConif  2.936649e-06  1.360588e-04  0.383195928  2.936649e-06
## modConif     7.927849e-10  5.358745e-07  0.252374752  7.927849e-10
## burn         1.000000e+00  1.527488e-01  0.743498683  0.000000e+00
## herb        -2.914186e-02  1.000000e+00  0.789288851  1.527488e-01
## decid       -6.670325e-03 -5.447483e-03  1.000000000  7.434987e-01
## burn.1       1.000000e+00 -2.914186e-02 -0.006670325  1.000000e+00
## alpine      -3.742813e-02 -3.056659e-02 -0.006996435 -3.742813e-02
##                   alpine
## Elevation2  0.000000e+00
## openConif   2.070797e-02
## closedConif 9.347979e-07
## modConif    1.122872e-10
## burn        6.625103e-02
## herb        1.336590e-01
## decid       7.314315e-01
## burn.1      6.625103e-02
## alpine      1.000000e+00
Lets plot the correlations between Elevation [7] and landcover types [18:29]
corrgram(wolfkde[c(7, 18:29)], order=TRUE, lower.panel=panel.ellipse,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Landcover Correlations with Elevation")

## so nothing too egregious 

## next human access[8]
corrgram(wolfkde[c(8, 18:29)], order=TRUE, lower.panel=panel.ellipse,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Landcover Correlations with Distance from Human Access")

## again, only issue is Rock and Ice but even then its not a huge effect. 

## we can essentially see this here: 
boxplot(Elevation2~landcov.f, ylab="Elevation (m)", data=wolfkde, las=3)

boxplot(DistFromHumanAccess2~landcov.f, ylab="Elevation (m)", data=wolfkde, las=3)

## so collinearity is not as important for categorical variables but it becomes important if we start to assess cagetorical interactions with continuous factors. 
2.0 Interaction between categorical factor and continuous
Relationship between whether wolves are responding to human use differently in open and closed cover types.
wolfkde$closed = 0
wolfkde$closed <- wolfkde$closedConif + wolfkde$modConif + wolfkde$openConif + wolfkde$decid + wolfkde$mixed + wolfkde$burn
## note I considered burn here as 'closed' - could change. 

wolfkde$closedFactor <-as.factor(wolfkde$closed)

ggplot(wolfkde, aes(x=DistFromHighHumanAccess2, y=used, fill=closedFactor)) + stat_smooth(method="glm", method.args = list(family="binomial"), level=0.95) #+ facet_grid(closed~.)

##### This shows the effect of distance from high human access varies a lot with whether wolves are in closed cover or not. But does there look to be an interaction?

## this shows the effect of distance from high human access varies a lot with whether wolves are in closed cover or not
## But does there look to be an interaction?

boxplot(DistFromHighHumanAccess2~closed, ylab="Distance from High Human (m)", data=wolfkde)

# so yes, you can only get far away from humans evidently in open landcover types (rock / ice) but this isnt that big a problem. 

## lets fit the model now
disthha.cover <-  glm(used~closed + DistFromHighHumanAccess2 + closed*DistFromHighHumanAccess2, data =wolfkde, family= binomial(logit))
summary(disthha.cover)
## 
## Call:
## glm(formula = used ~ closed + DistFromHighHumanAccess2 + closed * 
##     DistFromHighHumanAccess2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7790  -0.7403  -0.5331  -0.2311   3.0188  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -1.642e+00  1.573e-01 -10.436  < 2e-16 ***
## closed                           6.050e-01  1.737e-01   3.483 0.000495 ***
## DistFromHighHumanAccess2        -2.561e-04  5.350e-05  -4.787 1.69e-06 ***
## closed:DistFromHighHumanAccess2  1.165e-04  6.298e-05   1.850 0.064321 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2180.1  on 2369  degrees of freedom
## Residual deviance: 2036.9  on 2366  degrees of freedom
##   (39 observations deleted due to missingness)
## AIC: 2044.9
## 
## Number of Fisher Scoring iterations: 6
boxplot(DistFromHighHumanAccess2~closedFactor, ylab="Distance From High Human Access (m)", data=wolfkde)

So yes, you can only get far away from humans evidently in open landcover types (rock / ice) but this isnt that big a problem.
Lets try again with distance to human access
ggplot(wolfkde, aes(x=DistFromHumanAccess2, y=used, fill=closedFactor)) + stat_smooth(method="glm", method.args = list(family="binomial"), level=0.95) #+ facet_grid(closed~.)

## bit more evidence of an interaction here (the lines cross)
boxplot(DistFromHumanAccess2~closedFactor, ylab="Distance From High Human Access (m)", data=wolfkde)

distha.cover <-  glm(used~closed + DistFromHumanAccess2 + closed*DistFromHumanAccess2, data =wolfkde, family= binomial(logit))
summary(distha.cover)
## 
## Call:
## glm(formula = used ~ closed + DistFromHumanAccess2 + closed * 
##     DistFromHumanAccess2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.23950  -0.75295  -0.33048  -0.00622   2.82414  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  0.1448340  0.2022547   0.716  0.47393    
## closed                      -0.6663391  0.2216303  -3.007  0.00264 ** 
## DistFromHumanAccess2        -0.0044402  0.0005556  -7.992 1.33e-15 ***
## closed:DistFromHumanAccess2  0.0029658  0.0005815   5.100 3.40e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2180.1  on 2369  degrees of freedom
## Residual deviance: 1767.6  on 2366  degrees of freedom
##   (39 observations deleted due to missingness)
## AIC: 1775.6
## 
## Number of Fisher Scoring iterations: 8

OBJECTIVE 3.0 - Model building

How best to proceed? Can’t really consider models with moose, deer and elk together should not consider models with distance to high human use and elevation together
One option is Stepwise model selection.
But first we have to clean up msising data
Note there were 252 NA’s for some covariates

3.1 Manual Model Selection Using AIC

After we have fit ANY model in the str() we can see that there is AIC information stored
cover <-  glm(used~closed, data =wolfkde2, family= binomial(logit))
## Estimating AIC manually

logLik(cover) ## this is the log likelihood
## 'log Lik.' -976.7078 (df=2)
2*(length(cover$coefficients)) ## this is the number of parameters
## [1] 4
## But note that we can calcualte AIC manually using -2* LL + 2*K where LL is logLik and K is # of parameters (without considering the small sample size correction)
-2*as.numeric(logLik(cover))+2*(length(cover$coefficients))
## [1] 1957.416
cover$aic
## [1] 1957.416
## Lets use using AIC to select interactions...
distha <-  glm(used~DistFromHumanAccess2, data =wolfkde2, family= binomial(logit))
distha.cover <-  glm(used~closed + DistFromHumanAccess2, data =wolfkde2, family= binomial(logit)) ## Main effects only
disthaXcover <-  glm(used~closed + DistFromHumanAccess2 + closed*DistFromHumanAccess2, data =wolfkde2, family= binomial(logit))
      
AIC(cover, distha, distha.cover, disthaXcover)
##              df      AIC
## cover         2 1957.416
## distha        2 1655.955
## distha.cover  3 1652.944
## disthaXcover  4 1622.678
## so STRONG evidence that model disthhaXcover is much better than disthhaXcover
      
## lets redo with distance to high human access
disthha <-  glm(used~DistFromHighHumanAccess2, data =wolfkde2, family= binomial(logit))
disthha.cover <-  glm(used~closed + DistFromHighHumanAccess2, data =wolfkde2, family= binomial(logit)) ## Main effects only
disthhaXcover <-  glm(used~closed + DistFromHighHumanAccess2 + closed*DistFromHighHumanAccess2, data =wolfkde2, family= binomial(logit))
      
AIC(cover, disthha, disthha.cover, disthhaXcover)
##               df      AIC
## cover          2 1957.416
## disthha        2 1931.577
## disthha.cover  3 1896.049
## disthhaXcover  4 1887.325
## Again, STRONG evidence that model disthhaXcover is much better than disthhaXcover
3.2 Stepwise model selection
# Lets review the full.model again
full.model = glm(used~deer_w2 + elk_w2 +moose_w2 +sheep_w2+goat_w2+Elevation2+DistFromHumanAccess2+DistFromHighHumanAccess2 +closed + closed*DistFromHighHumanAccess2, data =wolfkde2, family= binomial(logit))
Note here that I suppressed the output of this next command
## Backwards selection
stepAIC(full.model, direction = "backward")
## Start:  AIC=1289.86
## used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + 
##     DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed + 
##     closed * DistFromHighHumanAccess2
## 
##                                   Df Deviance    AIC
## - DistFromHighHumanAccess2:closed  1   1269.2 1289.2
## <none>                                 1267.9 1289.9
## - deer_w2                          1   1270.6 1290.6
## - sheep_w2                         1   1271.0 1291.0
## - goat_w2                          1   1277.6 1297.6
## - DistFromHumanAccess2             1   1280.3 1300.3
## - elk_w2                           1   1280.9 1300.9
## - moose_w2                         1   1290.1 1310.1
## - Elevation2                       1   1340.6 1360.6
## 
## Step:  AIC=1289.2
## used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + 
##     DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed
## 
##                            Df Deviance    AIC
## <none>                          1269.2 1289.2
## - closed                    1   1271.7 1289.7
## - deer_w2                   1   1272.3 1290.3
## - sheep_w2                  1   1272.8 1290.8
## - goat_w2                   1   1279.2 1297.2
## - DistFromHumanAccess2      1   1282.0 1300.0
## - elk_w2                    1   1282.4 1300.4
## - moose_w2                  1   1290.3 1308.3
## - DistFromHighHumanAccess2  1   1296.0 1314.0
## - Elevation2                1   1341.1 1359.1
## 
## Call:  glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed, family = binomial(logit), data = wolfkde2)
## 
## Coefficients:
##              (Intercept)                   deer_w2  
##                6.6091155                 0.2057860  
##                   elk_w2                  moose_w2  
##                0.4493350                -0.3994125  
##                 sheep_w2                   goat_w2  
##               -0.1032437                -0.1873161  
##               Elevation2      DistFromHumanAccess2  
##               -0.0047249                -0.0006645  
## DistFromHighHumanAccess2                    closed  
##                0.0001813                -0.3139421  
## 
## Degrees of Freedom: 2117 Total (i.e. Null);  2108 Residual
## Null Deviance:       2041 
## Residual Deviance: 1269  AIC: 1289
top.backwards = glm(used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2, data=wolfkde2,family=binomial(logit))
summary(top.backwards)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2, 
##     family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9200  -0.4636  -0.1536  -0.0378   3.2277  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               6.418e+00  1.349e+00   4.759 1.95e-06 ***
## deer_w2                   1.859e-01  1.169e-01   1.590 0.111758    
## elk_w2                    4.677e-01  1.249e-01   3.744 0.000181 ***
## moose_w2                 -4.129e-01  9.038e-02  -4.568 4.91e-06 ***
## sheep_w2                 -1.015e-01  5.464e-02  -1.857 0.063244 .  
## goat_w2                  -1.800e-01  5.987e-02  -3.006 0.002647 ** 
## Elevation2               -4.746e-03  6.207e-04  -7.646 2.08e-14 ***
## DistFromHumanAccess2     -6.667e-04  1.988e-04  -3.354 0.000796 ***
## DistFromHighHumanAccess2  1.850e-04  3.443e-05   5.372 7.78e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1271.7  on 2109  degrees of freedom
## AIC: 1289.7
## 
## Number of Fisher Scoring iterations: 7
NExt, lets do Forward Stepwise model selection
# Forwards selection - First define a NULL model as the starting place
null.model = glm(used~1,data=wolfkde2,family=binomial(logit))
This time with output from stepAIC supressed
## Backwards selection
stepAIC(full.model, direction = "backward")
## Start:  AIC=1289.86
## used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + 
##     DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed + 
##     closed * DistFromHighHumanAccess2
## 
##                                   Df Deviance    AIC
## - DistFromHighHumanAccess2:closed  1   1269.2 1289.2
## <none>                                 1267.9 1289.9
## - deer_w2                          1   1270.6 1290.6
## - sheep_w2                         1   1271.0 1291.0
## - goat_w2                          1   1277.6 1297.6
## - DistFromHumanAccess2             1   1280.3 1300.3
## - elk_w2                           1   1280.9 1300.9
## - moose_w2                         1   1290.1 1310.1
## - Elevation2                       1   1340.6 1360.6
## 
## Step:  AIC=1289.2
## used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + 
##     DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed
## 
##                            Df Deviance    AIC
## <none>                          1269.2 1289.2
## - closed                    1   1271.7 1289.7
## - deer_w2                   1   1272.3 1290.3
## - sheep_w2                  1   1272.8 1290.8
## - goat_w2                   1   1279.2 1297.2
## - DistFromHumanAccess2      1   1282.0 1300.0
## - elk_w2                    1   1282.4 1300.4
## - moose_w2                  1   1290.3 1308.3
## - DistFromHighHumanAccess2  1   1296.0 1314.0
## - Elevation2                1   1341.1 1359.1
## 
## Call:  glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed, family = binomial(logit), data = wolfkde2)
## 
## Coefficients:
##              (Intercept)                   deer_w2  
##                6.6091155                 0.2057860  
##                   elk_w2                  moose_w2  
##                0.4493350                -0.3994125  
##                 sheep_w2                   goat_w2  
##               -0.1032437                -0.1873161  
##               Elevation2      DistFromHumanAccess2  
##               -0.0047249                -0.0006645  
## DistFromHighHumanAccess2                    closed  
##                0.0001813                -0.3139421  
## 
## Degrees of Freedom: 2117 Total (i.e. Null);  2108 Residual
## Null Deviance:       2041 
## Residual Deviance: 1269  AIC: 1289
top.backwards = glm(used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2, data=wolfkde2,family=binomial(logit))
summary(top.backwards)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2, 
##     family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9200  -0.4636  -0.1536  -0.0378   3.2277  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               6.418e+00  1.349e+00   4.759 1.95e-06 ***
## deer_w2                   1.859e-01  1.169e-01   1.590 0.111758    
## elk_w2                    4.677e-01  1.249e-01   3.744 0.000181 ***
## moose_w2                 -4.129e-01  9.038e-02  -4.568 4.91e-06 ***
## sheep_w2                 -1.015e-01  5.464e-02  -1.857 0.063244 .  
## goat_w2                  -1.800e-01  5.987e-02  -3.006 0.002647 ** 
## Elevation2               -4.746e-03  6.207e-04  -7.646 2.08e-14 ***
## DistFromHumanAccess2     -6.667e-04  1.988e-04  -3.354 0.000796 ***
## DistFromHighHumanAccess2  1.850e-04  3.443e-05   5.372 7.78e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1271.7  on 2109  degrees of freedom
## AIC: 1289.7
## 
## Number of Fisher Scoring iterations: 7
# Forwards selection
null.model = glm(used~1,data=wolfkde2,family=binomial(logit))
stepAIC(null.model, scope=list(upper=full.model, lower= null.model),direction="forward")
## Start:  AIC=2042.9
## used ~ 1
## 
##                            Df Deviance    AIC
## + Elevation2                1   1405.4 1409.4
## + deer_w2                   1   1592.4 1596.4
## + elk_w2                    1   1625.8 1629.8
## + DistFromHumanAccess2      1   1652.0 1656.0
## + goat_w2                   1   1818.7 1822.7
## + DistFromHighHumanAccess2  1   1927.6 1931.6
## + moose_w2                  1   1931.3 1935.3
## + closed                    1   1953.4 1957.4
## + sheep_w2                  1   2027.7 2031.7
## <none>                          2040.9 2042.9
## 
## Step:  AIC=1409.4
## used ~ Elevation2
## 
##                            Df Deviance    AIC
## + DistFromHighHumanAccess2  1   1359.7 1365.7
## + elk_w2                    1   1371.9 1377.9
## + deer_w2                   1   1376.4 1382.4
## + moose_w2                  1   1377.5 1383.5
## + DistFromHumanAccess2      1   1384.2 1390.2
## + closed                    1   1399.6 1405.6
## + goat_w2                   1   1401.8 1407.8
## <none>                          1405.4 1409.4
## + sheep_w2                  1   1403.5 1409.5
## 
## Step:  AIC=1365.74
## used ~ Elevation2 + DistFromHighHumanAccess2
## 
##                        Df Deviance    AIC
## + moose_w2              1   1326.1 1334.1
## + deer_w2               1   1335.3 1343.3
## + DistFromHumanAccess2  1   1340.8 1348.8
## + elk_w2                1   1342.9 1350.9
## + closed                1   1356.1 1364.1
## + goat_w2               1   1356.2 1364.2
## + sheep_w2              1   1357.0 1365.0
## <none>                      1359.7 1365.7
## 
## Step:  AIC=1334.14
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2
## 
##                        Df Deviance    AIC
## + elk_w2                1   1300.8 1310.8
## + deer_w2               1   1310.8 1320.8
## + DistFromHumanAccess2  1   1312.1 1322.1
## + goat_w2               1   1322.8 1332.8
## <none>                      1326.1 1334.1
## + closed                1   1324.5 1334.5
## + sheep_w2              1   1325.6 1335.6
## 
## Step:  AIC=1310.76
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2
## 
##                        Df Deviance    AIC
## + goat_w2               1   1289.0 1301.0
## + DistFromHumanAccess2  1   1289.5 1301.5
## + sheep_w2              1   1294.9 1306.9
## <none>                      1300.8 1310.8
## + closed                1   1299.5 1311.5
## + deer_w2               1   1299.6 1311.6
## 
## Step:  AIC=1300.99
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2 + 
##     goat_w2
## 
##                        Df Deviance    AIC
## + DistFromHumanAccess2  1   1277.2 1291.2
## + sheep_w2              1   1286.0 1300.0
## <none>                      1289.0 1301.0
## + closed                1   1287.0 1301.0
## + deer_w2               1   1287.8 1301.8
## 
## Step:  AIC=1291.21
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2 + 
##     goat_w2 + DistFromHumanAccess2
## 
##            Df Deviance    AIC
## + sheep_w2  1   1274.2 1290.2
## + deer_w2   1   1275.1 1291.1
## <none>          1277.2 1291.2
## + closed    1   1275.3 1291.3
## 
## Step:  AIC=1290.24
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2 + 
##     goat_w2 + DistFromHumanAccess2 + sheep_w2
## 
##           Df Deviance    AIC
## + deer_w2  1   1271.7 1289.7
## <none>         1274.2 1290.2
## + closed   1   1272.3 1290.3
## 
## Step:  AIC=1289.68
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2 + 
##     goat_w2 + DistFromHumanAccess2 + sheep_w2 + deer_w2
## 
##          Df Deviance    AIC
## + closed  1   1269.2 1289.2
## <none>        1271.7 1289.7
## 
## Step:  AIC=1289.2
## used ~ Elevation2 + DistFromHighHumanAccess2 + moose_w2 + elk_w2 + 
##     goat_w2 + DistFromHumanAccess2 + sheep_w2 + deer_w2 + closed
## 
##                                   Df Deviance    AIC
## <none>                                 1269.2 1289.2
## + DistFromHighHumanAccess2:closed  1   1267.9 1289.9
## 
## Call:  glm(formula = used ~ Elevation2 + DistFromHighHumanAccess2 + 
##     moose_w2 + elk_w2 + goat_w2 + DistFromHumanAccess2 + sheep_w2 + 
##     deer_w2 + closed, family = binomial(logit), data = wolfkde2)
## 
## Coefficients:
##              (Intercept)                Elevation2  
##                6.6091155                -0.0047249  
## DistFromHighHumanAccess2                  moose_w2  
##                0.0001813                -0.3994125  
##                   elk_w2                   goat_w2  
##                0.4493350                -0.1873161  
##     DistFromHumanAccess2                  sheep_w2  
##               -0.0006645                -0.1032437  
##                  deer_w2                    closed  
##                0.2057860                -0.3139421  
## 
## Degrees of Freedom: 2117 Total (i.e. Null);  2108 Residual
## Null Deviance:       2041 
## Residual Deviance: 1269  AIC: 1289
## lots of output supressed in Rmarkdown

## Ok - Note the best model selected with stepwise forward selection was the same
top.forward = glm(used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed, data=wolfkde2,family=binomial(logit))
summary(top.forward)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed, family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.99463  -0.45913  -0.15990  -0.04219   3.15046  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               6.609e+00  1.346e+00   4.910 9.12e-07 ***
## deer_w2                   2.058e-01  1.175e-01   1.751 0.079866 .  
## elk_w2                    4.493e-01  1.253e-01   3.585 0.000337 ***
## moose_w2                 -3.994e-01  9.023e-02  -4.427 9.56e-06 ***
## sheep_w2                 -1.032e-01  5.465e-02  -1.889 0.058874 .  
## goat_w2                  -1.873e-01  6.010e-02  -3.116 0.001830 ** 
## Elevation2               -4.725e-03  6.170e-04  -7.658 1.89e-14 ***
## DistFromHumanAccess2     -6.645e-04  1.981e-04  -3.354 0.000796 ***
## DistFromHighHumanAccess2  1.813e-04  3.428e-05   5.288 1.23e-07 ***
## closed                   -3.139e-01  1.991e-01  -1.577 0.114856    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1269.2  on 2108  degrees of freedom
## AIC: 1289.2
## 
## Number of Fisher Scoring iterations: 7
vif(top.forward)
##                  deer_w2                   elk_w2                 moose_w2 
##                 2.319067                 2.370088                 1.499926 
##                 sheep_w2                  goat_w2               Elevation2 
##                 1.251304                 1.422406                 3.774381 
##     DistFromHumanAccess2 DistFromHighHumanAccess2                   closed 
##                 1.293130                 2.084111                 1.063505
But there are a bunch of collinear variables in the model, moose/elk/deer, human/human/human.. basically everything is being retained, not much kicked out.
Now what about landcover (rock Ice as intercept)
full.model.landcov = glm(used~ closedConif +modConif+openConif+decid+regen+mixed+herb+shrub+water+burn+alpine, data =wolfkde2, family= binomial(logit))
stepAIC(full.model.landcov, direction = "backward")
## Start:  AIC=1699.74
## used ~ closedConif + modConif + openConif + decid + regen + mixed + 
##     herb + shrub + water + burn + alpine
## 
## 
## Step:  AIC=1699.74
## used ~ closedConif + modConif + openConif + decid + mixed + herb + 
##     shrub + water + burn + alpine
## 
##               Df Deviance    AIC
## - decid        1   1677.8 1697.8
## - alpine       1   1678.0 1698.0
## <none>             1677.7 1699.7
## - closedConif  1   1728.2 1748.2
## - water        1   1737.9 1757.9
## - herb         1   1753.8 1773.8
## - shrub        1   1757.1 1777.1
## - openConif    1   1778.8 1798.8
## - burn         1   1810.6 1830.6
## - mixed        1   1815.0 1835.0
## - modConif     1   1847.5 1867.5
## 
## Step:  AIC=1697.84
## used ~ closedConif + modConif + openConif + mixed + herb + shrub + 
##     water + burn + alpine
## 
##               Df Deviance    AIC
## - alpine       1   1678.1 1696.1
## <none>             1677.8 1697.8
## - closedConif  1   1728.6 1746.6
## - water        1   1738.1 1756.1
## - herb         1   1754.1 1772.1
## - shrub        1   1757.4 1775.4
## - openConif    1   1779.2 1797.2
## - burn         1   1811.0 1829.0
## - mixed        1   1815.4 1833.4
## - modConif     1   1848.5 1866.5
## 
## Step:  AIC=1696.07
## used ~ closedConif + modConif + openConif + mixed + herb + shrub + 
##     water + burn
## 
##               Df Deviance    AIC
## <none>             1678.1 1696.1
## - closedConif  1   1731.0 1747.0
## - water        1   1738.4 1754.4
## - herb         1   1754.9 1770.9
## - shrub        1   1759.3 1775.3
## - openConif    1   1781.9 1797.9
## - burn         1   1813.9 1829.9
## - mixed        1   1817.6 1833.6
## - modConif     1   1860.9 1876.9
## 
## Call:  glm(formula = used ~ closedConif + modConif + openConif + mixed + 
##     herb + shrub + water + burn, family = binomial(logit), data = wolfkde2)
## 
## Coefficients:
## (Intercept)  closedConif     modConif    openConif        mixed  
##      -3.970        2.022        2.923        3.305        4.773  
##        herb        shrub        water         burn  
##       3.970        3.109        4.376        4.092  
## 
## Degrees of Freedom: 2117 Total (i.e. Null);  2109 Residual
## Null Deviance:       2041 
## Residual Deviance: 1678  AIC: 1696
top.model.landcov = glm(used~openConif+modConif+closedConif+mixed+herb+shrub+water+burn, data =wolfkde2, family= binomial(logit))
summary(top.model.landcov)
## 
## Call:
## glm(formula = used ~ openConif + modConif + closedConif + mixed + 
##     herb + shrub + water + burn, family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5315  -0.7756  -0.5162  -0.1933   2.8245  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.9703     0.2914 -13.627  < 2e-16 ***
## openConif     3.3045     0.3547   9.317  < 2e-16 ***
## modConif      2.9232     0.3050   9.584  < 2e-16 ***
## closedConif   2.0219     0.3239   6.242 4.33e-10 ***
## mixed         4.7726     0.4431  10.772  < 2e-16 ***
## herb          3.9703     0.4427   8.968  < 2e-16 ***
## shrub         3.1088     0.3637   8.547  < 2e-16 ***
## water         4.3758     0.5415   8.081 6.43e-16 ***
## burn          4.0917     0.3817  10.719  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1678.1  on 2109  degrees of freedom
## AIC: 1696.1
## 
## Number of Fisher Scoring iterations: 6
vif(top.model.landcov)
##   openConif    modConif closedConif       mixed        herb       shrub 
##    2.795640    6.214955    4.265224    1.703239    1.705037    2.571701 
##       water        burn 
##    1.382375    2.249269
## lets use this combination of Landcover covariates next as the BEST top model
OBJECTIVE 4.0 - model selection using the AICcmodavg package
Ok - we are going to take 2 competing sets of models.
Model 1 set - JUST biotic covariates, prey species and humans
Model 2 set - JUST environmental covariate models
Class excercise on the board
FIT CANDIDATE MODELS with the AICcmodavg package
m.biotic <- list()
head(m.biotic)
## list()
#lets fit our a-priori list of models 

## Model set 1: Biotic interactions, deer/elk/moose all too correlated to put in the same model
## sheep and goat are OK

## Model set 1: Biotic
m.biotic[[1]] <- glm(used ~ 1, family=binomial(logit), data=wolfkde2)
m.biotic[[2]] <- glm(used ~ elk_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[3]] <- glm(used ~ deer_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[4]] <- glm(used ~ moose_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[5]] <- glm(used ~ sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[6]] <- glm(used ~ goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[7]] <- glm(used ~ moose_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[8]] <- glm(used ~ deer_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[9]] <- glm(used ~ elk_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[10]] <- glm(used ~ elk_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[11]] <- glm(used ~ deer_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[12]] <- glm(used ~ moose_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[13]] <- glm(used ~ sheep_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[14]] <- glm(used ~ DistFromHighHumanAccess2, family=binomial(logit), data=wolfkde2)
m.biotic[[15]] <- glm(used ~ DistFromHighHumanAccess2+deer_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[16]] <- glm(used ~ DistFromHighHumanAccess2+moose_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[17]] <- glm(used ~ DistFromHighHumanAccess2+sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[18]] <- glm(used ~ DistFromHighHumanAccess2+goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[19]] <- glm(used ~ DistFromHighHumanAccess2+moose_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[20]] <- glm(used ~ DistFromHighHumanAccess2+deer_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[21]] <- glm(used ~ DistFromHighHumanAccess2+elk_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[22]] <- glm(used ~ DistFromHighHumanAccess2+elk_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[23]] <- glm(used ~ DistFromHighHumanAccess2+deer_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[24]] <- glm(used ~ DistFromHighHumanAccess2+moose_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[25]] <- glm(used ~ DistFromHighHumanAccess2+sheep_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[26]] <- glm(used ~ DistFromHighHumanAccess2, family=binomial(logit), data=wolfkde2)
m.biotic[[27]] <- glm(used ~ DistFromHighHumanAccess2+deer_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[28]] <- glm(used ~ DistFromHighHumanAccess2+moose_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[29]] <- glm(used ~ DistFromHighHumanAccess2+sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[30]] <- glm(used ~ DistFromHighHumanAccess2+goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[31]] <- glm(used ~ DistFromHighHumanAccess2+moose_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[32]] <- glm(used ~ DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)
m.biotic[[33]] <- glm(used ~ DistFromHumanAccess2+deer_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[34]] <- glm(used ~ DistFromHumanAccess2+moose_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[35]] <- glm(used ~ DistFromHumanAccess2+sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[36]] <- glm(used ~ DistFromHumanAccess2+goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[37]] <- glm(used ~ DistFromHumanAccess2+moose_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[38]] <- glm(used ~ DistFromHumanAccess2+deer_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[39]] <- glm(used ~ DistFromHumanAccess2+elk_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[40]] <- glm(used ~ DistFromHumanAccess2+elk_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[41]] <- glm(used ~ DistFromHumanAccess2+deer_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[42]] <- glm(used ~ DistFromHumanAccess2+moose_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[43]] <- glm(used ~ DistFromHumanAccess2+sheep_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[44]] <- glm(used ~ DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)
m.biotic[[45]] <- glm(used ~ DistFromHumanAccess2+deer_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[46]] <- glm(used ~ DistFromHumanAccess2+moose_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[47]] <- glm(used ~ DistFromHumanAccess2+sheep_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[48]] <- glm(used ~ DistFromHumanAccess2+goat_w2, family=binomial(logit), data=wolfkde2)
m.biotic[[49]] <- glm(used ~ DistFromHumanAccess2+moose_w2 + sheep_w2, family=binomial(logit), data=wolfkde2)
                   

## then name our models .
## note you can name your models with a command like this
# model.names <-  ("null", "disthha", "distacc", "sheepwi", "goatwin", "elkwint", "moosewin", "deerwin") but in this case there were 49 models
model.names.biotic <-c("m0","m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12","m13","m14","m15","m16","m17","m18","m19","m20","m21","m22","m23","m24","m25","m26","m27","m28","m29","m30","m31","m32","m33","m34","m35","m36","m37","m38","m39","m40","m41","m42","m43","m44", "m45","m46","m47","m48")
model.names.biotic <-1:49

aictab(cand.set = m.biotic, modnames = model.names.biotic)
## 
## Model selection based on AICc:
## 
##    K    AICc Delta_AICc AICcWt Cum.Wt       LL
## 41 4 1406.25       0.00    0.8    0.8  -699.12
## 40 4 1409.02       2.76    0.2    1.0  -700.50
## 38 4 1431.91      25.66    0.0    1.0  -711.95
## 39 4 1442.35      36.09    0.0    1.0  -717.16
## 45 3 1455.14      48.88    0.0    1.0  -724.56
## 33 3 1455.14      48.88    0.0    1.0  -724.56
## 22 4 1475.97      69.71    0.0    1.0  -733.97
## 10 3 1480.37      74.12    0.0    1.0  -737.18
## 11 3 1494.35      88.10    0.0    1.0  -744.17
## 23 4 1496.27      90.02    0.0    1.0  -744.13
## 21 4 1523.86     117.61    0.0    1.0  -757.92
## 9  3 1538.90     132.65    0.0    1.0  -766.45
## 8  3 1548.35     142.09    0.0    1.0  -771.17
## 20 4 1548.49     142.24    0.0    1.0  -770.24
## 48 3 1573.18     166.92    0.0    1.0  -783.58
## 36 3 1573.18     166.92    0.0    1.0  -783.58
## 43 4 1574.46     168.21    0.0    1.0  -783.22
## 42 4 1574.86     168.60    0.0    1.0  -783.42
## 27 3 1590.86     184.61    0.0    1.0  -792.42
## 15 3 1590.86     184.61    0.0    1.0  -792.42
## 3  2 1596.37     190.12    0.0    1.0  -796.18
## 2  2 1629.82     223.57    0.0    1.0  -812.91
## 49 4 1636.92     230.67    0.0    1.0  -814.45
## 37 4 1636.92     230.67    0.0    1.0  -814.45
## 47 3 1643.21     236.96    0.0    1.0  -818.60
## 35 3 1643.21     236.96    0.0    1.0  -818.60
## 46 3 1650.89     244.64    0.0    1.0  -822.44
## 34 3 1650.89     244.64    0.0    1.0  -822.44
## 44 2 1655.96     249.71    0.0    1.0  -825.98
## 32 2 1655.96     249.71    0.0    1.0  -825.98
## 24 4 1765.76     359.51    0.0    1.0  -878.87
## 30 3 1784.38     378.12    0.0    1.0  -889.18
## 18 3 1784.38     378.12    0.0    1.0  -889.18
## 25 4 1784.39     378.14    0.0    1.0  -888.19
## 12 3 1786.62     380.36    0.0    1.0  -890.30
## 13 3 1821.85     415.60    0.0    1.0  -907.92
## 6  2 1822.67     416.42    0.0    1.0  -909.33
## 31 4 1865.18     458.92    0.0    1.0  -928.58
## 19 4 1865.18     458.92    0.0    1.0  -928.58
## 28 3 1881.35     475.10    0.0    1.0  -937.67
## 16 3 1881.35     475.10    0.0    1.0  -937.67
## 7  3 1906.78     500.53    0.0    1.0  -950.38
## 29 3 1924.60     518.35    0.0    1.0  -959.29
## 17 3 1924.60     518.35    0.0    1.0  -959.29
## 26 2 1931.58     525.33    0.0    1.0  -963.79
## 14 2 1931.58     525.33    0.0    1.0  -963.79
## 4  2 1935.30     529.05    0.0    1.0  -965.65
## 5  2 2031.68     625.43    0.0    1.0 -1013.84
## 1  1 2042.90     636.64    0.0    1.0 -1020.45
## OK so the top model was model 41

top.biotic <- glm(used ~ DistFromHumanAccess2+deer_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
summary(top.biotic)
## 
## Call:
## glm(formula = used ~ DistFromHumanAccess2 + deer_w2 + goat_w2, 
##     family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8345  -0.6246  -0.1888  -0.0306   3.1247  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -3.553037   0.366553  -9.693  < 2e-16 ***
## DistFromHumanAccess2 -0.001422   0.000183  -7.770 7.86e-15 ***
## deer_w2               0.898069   0.077941  11.522  < 2e-16 ***
## goat_w2              -0.333540   0.048524  -6.874 6.26e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1398.2  on 2114  degrees of freedom
## AIC: 1406.2
## 
## Number of Fisher Scoring iterations: 7
## and the 2nd ranked top biotic model was  model 40
second.biotic <- glm(used ~ DistFromHumanAccess2+elk_w2 + goat_w2, family=binomial(logit), data=wolfkde2)
summary(second.biotic)
## 
## Call:
## glm(formula = used ~ DistFromHumanAccess2 + elk_w2 + goat_w2, 
##     family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8948  -0.6013  -0.2043  -0.0340   3.1924  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -3.5482326  0.3639984  -9.748  < 2e-16 ***
## DistFromHumanAccess2 -0.0013001  0.0001839  -7.068 1.57e-12 ***
## elk_w2                0.9358383  0.0803689  11.644  < 2e-16 ***
## goat_w2              -0.4169449  0.0489264  -8.522  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1401.0  on 2114  degrees of freedom
## AIC: 1409
## 
## Number of Fisher Scoring iterations: 7
Model set 2: Environmental Covariates Only
m.env <- list()
head(m.env)
## list()
## Model set 1: Biotic
m.env[[1]] <- glm(used ~ 1, family=binomial(logit), data=wolfkde2)
m.env[[2]] <- glm(used ~ Elevation2, family=binomial(logit), data=wolfkde2)
m.env[[3]] <- glm(used ~ DistFromHighHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[4]] <- glm(used ~ DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[5]] <- glm(used ~ openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[6]] <- glm(used ~ Elevation2 + DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[7]] <- glm(used ~ DistFromHighHumanAccess2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[8]] <- glm(used ~ DistFromHumanAccess2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[9]] <- glm(used ~ Elevation2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[10]] <- glm(used ~ Elevation2 + DistFromHumanAccess2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[11]] <- glm(used ~ Elevation2 + DistFromHighHumanAccess2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
m.env[[12]] <- glm(used ~ Elevation2 + DistFromHighHumanAccess2 + closed + closed*DistFromHighHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[13]] <- glm(used ~ Elevation2 + DistFromHumanAccess2 + closed + closed*DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[14]] <- glm(used ~ DistFromHighHumanAccess2 + closed + closed*DistFromHighHumanAccess2, family=binomial(logit), data=wolfkde2)
m.env[[15]] <- glm(used ~ DistFromHumanAccess2 + closed + closed*DistFromHumanAccess2, family=binomial(logit), data=wolfkde2)


model.names.env <-1:15

aictab(cand.set = m.env, modnames = model.names.env)
## 
## Model selection based on AICc:
## 
##     K    AICc Delta_AICc AICcWt Cum.Wt       LL
## 11 11 1320.41       0.00      1      1  -649.14
## 10 11 1333.29      12.89      0      1  -655.58
## 9  10 1344.81      24.41      0      1  -662.35
## 12  5 1364.80      44.39      0      1  -677.39
## 13  5 1383.67      63.26      0      1  -686.82
## 6   3 1390.24      69.84      0      1  -692.12
## 2   2 1409.40      88.99      0      1  -702.70
## 8  10 1534.04     213.63      0      1  -756.97
## 15  4 1622.70     302.29      0      1  -807.34
## 7  10 1650.52     330.11      0      1  -815.21
## 4   2 1655.96     335.55      0      1  -825.98
## 5   9 1696.15     375.74      0      1  -839.03
## 14  4 1887.34     566.94      0      1  -939.66
## 3   2 1931.58     611.17      0      1  -963.79
## 1   1 2042.90     722.49      0      1 -1020.45
#OK - top model is model 11
top.env <- glm(used ~ Elevation2 + DistFromHighHumanAccess2 + openConif+modConif+closedConif+mixed+herb+shrub+water+burn, family=binomial(logit), data=wolfkde2)
summary(top.env)
## 
## Call:
## glm(formula = used ~ Elevation2 + DistFromHighHumanAccess2 + 
##     openConif + modConif + closedConif + mixed + herb + shrub + 
##     water + burn, family = binomial(logit), data = wolfkde2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0290  -0.5020  -0.1576  -0.0366   3.2732  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               9.570e+00  8.805e-01  10.869  < 2e-16 ***
## Elevation2               -6.782e-03  4.883e-04 -13.888  < 2e-16 ***
## DistFromHighHumanAccess2  1.867e-04  3.511e-05   5.317 1.05e-07 ***
## openConif                 8.457e-01  4.404e-01   1.920   0.0548 .  
## modConif                 -1.716e-02  3.836e-01  -0.045   0.9643    
## closedConif              -1.126e-01  3.944e-01  -0.286   0.7752    
## mixed                     1.325e+00  5.435e-01   2.438   0.0148 *  
## herb                      8.564e-01  5.525e-01   1.550   0.1212    
## shrub                     5.781e-01  4.486e-01   1.289   0.1974    
## water                     8.559e-01  6.389e-01   1.340   0.1804    
## burn                      1.861e+00  4.629e-01   4.021 5.80e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1298.3  on 2107  degrees of freedom
## AIC: 1320.3
## 
## Number of Fisher Scoring iterations: 7
Now - which ‘set’ of covariates is best? Env? or Biotic?
AIC(top.env, top.biotic)
##            df      AIC
## top.env    11 1320.283
## top.biotic  4 1406.235
## Environmental model HANDS DOWN. 

## now go back and compare 'top' model to top model selected by AIC

AIC(top.forward, top.biotic, second.biotic, top.env)
##               df      AIC
## top.forward   10 1289.201
## top.biotic     4 1406.235
## second.biotic  4 1409.000
## top.env       11 1320.283
## AIC will overfit models and does not penalize for collinearity. 
OBJECTIVE 5.0 - model selection using the MuMIn Package
Also explore the use of package MuMIn - Mutlimodel inference
http://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf
# re-run FULL logistic regression model
top.forward = glm(used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + closed + DistFromHighHumanAccess2*closed, data=wolfkde2,family=binomial(logit), na.action ="na.fail")
summary(top.forward)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.01010  -0.46000  -0.15835  -0.04175   3.12206  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      6.883e+00  1.375e+00   5.005 5.57e-07 ***
## deer_w2                          1.931e-01  1.181e-01   1.636 0.101885    
## elk_w2                           4.478e-01  1.256e-01   3.564 0.000365 ***
## moose_w2                        -4.165e-01  9.197e-02  -4.529 5.93e-06 ***
## sheep_w2                        -9.786e-02  5.500e-02  -1.779 0.075221 .  
## goat_w2                         -1.856e-01  6.021e-02  -3.082 0.002053 ** 
## Elevation2                      -4.762e-03  6.188e-04  -7.695 1.42e-14 ***
## DistFromHumanAccess2            -6.560e-04  1.984e-04  -3.305 0.000948 ***
## DistFromHighHumanAccess2         1.368e-04  5.302e-05   2.580 0.009883 ** 
## closed                          -4.357e-01  2.259e-01  -1.929 0.053765 .  
## DistFromHighHumanAccess2:closed  6.271e-05  5.518e-05   1.136 0.255759    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1267.9  on 2107  degrees of freedom
## AIC: 1289.9
## 
## Number of Fisher Scoring iterations: 7
#install and load MuMIn package
require(MuMIn)

#use dredge function to get all possible models
x1<-dredge(top.forward)

## x1 - looking at all 
## there are over 1000 models fit! 10! models = ? models
## note dredge has fit XX models in total out of this candidate set of 19 candidate variables

head(x1, n = 10) ## only shows top 10 models fit
## Global model call: glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## ---
## Model selection table 
##      (Int)     cls der_w2      DFHH       DFHA       El2 elk_w2  got_w2
## 512  6.609 -0.3139 0.2058 0.0001813 -0.0006645 -0.004725 0.4493 -0.1873
## 511  6.418         0.1859 0.0001850 -0.0006667 -0.004746 0.4677 -0.1800
## 1024 6.883 -0.4357 0.1931 0.0001368 -0.0006560 -0.004762 0.4478 -0.1856
## 509  7.420                0.0001804 -0.0006362 -0.005050 0.5823 -0.1808
## 510  7.671 -0.2740        0.0001765 -0.0006320 -0.005060 0.5779 -0.1872
## 1022 7.917 -0.4135        0.0001274 -0.0006244 -0.005083 0.5668 -0.1854
## 256  7.094 -0.3057 0.1853 0.0001818 -0.0006626 -0.004948 0.4240 -0.2115
## 768  7.376 -0.4449 0.1722 0.0001323 -0.0006534 -0.004978 0.4249 -0.2085
## 255  6.900         0.1666 0.0001855 -0.0006646 -0.004965 0.4416 -0.2034
## 253  7.783                0.0001814 -0.0006378 -0.005231 0.5472 -0.2022
##       mos_w2   shp_w2  cls:DFHH df   logLik   AICc delta weight
## 512  -0.3994 -0.10320           10 -634.601 1289.3  0.00  0.171
## 511  -0.4129 -0.10150            9 -635.842 1289.8  0.46  0.136
## 1024 -0.4165 -0.09786 6.271e-05 11 -633.929 1290.0  0.68  0.122
## 509  -0.4662 -0.09393            8 -637.118 1290.3  1.00  0.104
## 510  -0.4585 -0.09463            9 -636.149 1290.4  1.08  0.100
## 1022 -0.4743 -0.08905 7.065e-05 10 -635.279 1290.7  1.36  0.087
## 256  -0.4375                     9 -636.375 1290.8  1.53  0.080
## 768  -0.4550          7.031e-05 10 -635.502 1291.1  1.80  0.070
## 255  -0.4500                     8 -637.558 1291.2  1.88  0.067
## 253  -0.4961                     7 -638.605 1291.3  1.96  0.064
## Models ranked by AICc(x)
#get top models with AICc <2
top.models<-get.models(x1, subset=delta<2)
#top.models

#model average covariate effects
x6<-model.avg(top.models)
summary(x6)
## 
## Call:
## model.avg(object = top.models)
## 
## Component model call: 
## glm(formula = used ~ <11 unique rhs>, family = binomial(logit), 
##      data = wolfkde2, na.action = na.fail)
## 
## Component models: 
##                      df  logLik    AICc delta weight
## 1/2/3/4/5/6/7/8/9    10 -634.60 1289.31  0.00   0.16
## 2/3/4/5/6/7/8/9       9 -635.84 1289.77  0.46   0.13
## 1/2/3/4/5/6/7/8/9/10 11 -633.93 1289.98  0.68   0.11
## 3/4/5/6/7/8/9         8 -637.12 1290.30  1.00   0.10
## 1/3/4/5/6/7/8/9       9 -636.15 1290.38  1.08   0.09
## 1/3/4/5/6/7/8/9/10   10 -635.28 1290.66  1.36   0.08
## 1/2/3/4/5/6/7/8       9 -636.37 1290.83  1.53   0.07
## 1/2/3/4/5/6/7/8/10   10 -635.50 1291.11  1.80   0.07
## 2/3/4/5/6/7/8         8 -637.56 1291.18  1.88   0.06
## 3/4/5/6/7/8           7 -638.60 1291.26  1.96   0.06
## 1/3/4/5/6/7/8/10      9 -636.60 1291.28  1.98   0.06
## 
## Term codes: 
##                          closed                         deer_w2 
##                               1                               2 
##        DistFromHighHumanAccess2            DistFromHumanAccess2 
##                               3                               4 
##                      Elevation2                          elk_w2 
##                               5                               6 
##                         goat_w2                        moose_w2 
##                               7                               8 
##                        sheep_w2 closed:DistFromHighHumanAccess2 
##                               9                              10 
## 
## Model-averaged coefficients:  
## (full average) 
##                                   Estimate Std. Error Adjusted SE z value
## (Intercept)                      7.178e+00  1.409e+00   1.409e+00   5.093
## closed                          -2.374e-01  2.495e-01   2.496e-01   0.951
## deer_w2                          1.146e-01  1.301e-01   1.301e-01   0.881
## DistFromHighHumanAccess2         1.654e-04  4.748e-05   4.750e-05   3.482
## DistFromHumanAccess2            -6.499e-04  1.987e-04   1.989e-04   3.268
## Elevation2                      -4.935e-03  6.315e-04   6.318e-04   7.810
## elk_w2                           4.932e-01  1.314e-01   1.314e-01   3.752
## goat_w2                         -1.916e-01  6.067e-02   6.071e-02   3.156
## moose_w2                        -4.431e-01  9.399e-02   9.404e-02   4.712
## sheep_w2                        -6.612e-02  6.422e-02   6.424e-02   1.029
## closed:DistFromHighHumanAccess2  2.216e-05  4.480e-05   4.481e-05   0.495
##                                 Pr(>|z|)    
## (Intercept)                      4.0e-07 ***
## closed                          0.341586    
## deer_w2                         0.378544    
## DistFromHighHumanAccess2        0.000497 ***
## DistFromHumanAccess2            0.001082 ** 
## Elevation2                       < 2e-16 ***
## elk_w2                          0.000175 ***
## goat_w2                         0.001598 ** 
## moose_w2                         2.5e-06 ***
## sheep_w2                        0.303377    
## closed:DistFromHighHumanAccess2 0.620916    
##  
## (conditional average) 
##                                   Estimate Std. Error Adjusted SE z value
## (Intercept)                      7.178e+00  1.409e+00   1.409e+00   5.093
## closed                          -3.645e-01  2.220e-01   2.221e-01   1.641
## deer_w2                          1.890e-01  1.177e-01   1.178e-01   1.605
## DistFromHighHumanAccess2         1.654e-04  4.748e-05   4.750e-05   3.482
## DistFromHumanAccess2            -6.499e-04  1.987e-04   1.989e-04   3.268
## Elevation2                      -4.935e-03  6.315e-04   6.318e-04   7.810
## elk_w2                           4.932e-01  1.314e-01   1.314e-01   3.752
## goat_w2                         -1.916e-01  6.067e-02   6.071e-02   3.156
## moose_w2                        -4.431e-01  9.399e-02   9.404e-02   4.712
## sheep_w2                        -9.775e-02  5.482e-02   5.485e-02   1.782
## closed:DistFromHighHumanAccess2  6.891e-05  5.495e-05   5.498e-05   1.253
##                                 Pr(>|z|)    
## (Intercept)                     3.50e-07 ***
## closed                          0.100836    
## deer_w2                         0.108553    
## DistFromHighHumanAccess2        0.000497 ***
## DistFromHumanAccess2            0.001082 ** 
## Elevation2                       < 2e-16 ***
## elk_w2                          0.000175 ***
## goat_w2                         0.001598 ** 
## moose_w2                        2.45e-06 ***
## sheep_w2                        0.074766 .  
## closed:DistFromHighHumanAccess2 0.210078    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Relative variable importance: 
##                      DistFromHighHumanAccess2 DistFromHumanAccess2
## Importance:          1.00                     1.00                
## N containing models:   11                       11                
##                      Elevation2 elk_w2 goat_w2 moose_w2 sheep_w2 closed
## Importance:          1.00       1.00   1.00    1.00     0.68     0.65  
## N containing models:   11         11     11      11        6        7  
##                      deer_w2 closed:DistFromHighHumanAccess2
## Importance:          0.61    0.32                           
## N containing models:    6       4
top.dredge.lc = glm(used ~ openConif+modConif+closedConif+mixed+herb+shrub+water+burn+decid+regen+alpine, data=wolfkde2,family=binomial(logit), na.action ="na.fail")
x2<-dredge(top.dredge.lc)
head(x2, n=10)
## Global model call: glm(formula = used ~ openConif + modConif + closedConif + mixed + 
##     herb + shrub + water + burn + decid + regen + alpine, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## ---
## Model selection table 
##      (Intrc)   alpin  burn clsdC  decid  herb mixed mdCnf opnCn shrub
## 1783  -3.970         4.092 2.022        3.970 4.773 2.923 3.305 3.109
## 2039  -3.970         4.092 2.022        3.970 4.773 2.923 3.305 3.109
## 1784  -4.025  0.3878 4.147 2.077        4.025 4.828 2.978 3.360 3.164
## 2040  -4.025  0.3878 4.147 2.077        4.025 4.828 2.978 3.360 3.164
## 1791  -3.966         4.087 2.017 -9.601 3.966 4.768 2.918 3.300 3.104
## 2047  -3.966         4.087 2.017 -9.601 3.966 4.768 2.918 3.300 3.104
## 1792  -4.020  0.3824 4.141 2.072 -9.546 4.020 4.822 2.973 3.354 3.158
## 2048  -4.020  0.3824 4.141 2.072 -9.546 4.020 4.822 2.973 3.354 3.158
## 1780  -2.662 -0.9753 2.784              2.662 3.465 1.615 1.996 1.801
## 2036  -2.662 -0.9753 2.784              2.662 3.465 1.615 1.996 1.801
##      water df   logLik   AICc delta weight
## 1783 4.376  9 -839.034 1696.2  0.00  0.256
## 2039 4.376  9 -839.034 1696.2  0.00  0.256
## 1784 4.431 10 -838.921 1697.9  1.79  0.105
## 2040 4.431 10 -838.921 1697.9  1.79  0.105
## 1791 4.371 10 -838.977 1698.1  1.91  0.099
## 2047 4.371 10 -838.977 1698.1  1.91  0.099
## 1792 4.425 11 -838.868 1699.9  3.71  0.040
## 2048 4.425 11 -838.868 1699.9  3.71  0.040
## 1780 3.068  9 -864.294 1746.7 50.52  0.000
## 2036 3.068  9 -864.294 1746.7 50.52  0.000
## Models ranked by AICc(x)

Objective 6. Model Selection using BIC

Note we will use the function BIC() in the base {stats4} pacakge
This generic function calculates the Bayesian information criterion, also known as Schwarz’s Bayesian criterion (SBC), for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2log-likelihood + nparlog(nobs), where npar represents the number of parameters and nobs the number of observations in the fitted model.
In my estimation, BIC tends to be more conservative in preventing model overfitting because it does not consider K the ‘penalty’ function, but instead, considers K*log(n) where n is the number of rows of data, as the penalty function. Thus, it calculates a bigger penalty for larger datasets, which gaurds against overfitting. ##### there are not as many functions out there to calculate model selection using BIC ##### today we will use BIC in the package MuMIn
## First manually
AIC(top.forward, top.biotic, second.biotic, top.env)
##               df      AIC
## top.forward   11 1289.858
## top.biotic     4 1406.235
## second.biotic  4 1409.000
## top.env       11 1320.283
BIC(top.forward, top.biotic, second.biotic, top.env)
##               df      BIC
## top.forward   11 1352.099
## top.biotic     4 1428.868
## second.biotic  4 1431.633
## top.env       11 1382.524
## OK - so not much difference in top models using BIC and AIC in this dataset

## Now lets use the dredge function with BIC

x1.bic<-dredge(top.forward, rank=BIC) ## note this now ranks using BIC

## x1.bic - look at all 

head(x1.bic, n = 10) ## only shows top 10 models fit
## Global model call: glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## ---
## Model selection table 
##      (Int)     cls der_w2      DFHH       DFHA       El2 elk_w2  got_w2
## 253  7.783                0.0001814 -0.0006378 -0.005231 0.5472 -0.2022
## 245  8.768                0.0001893            -0.006011 0.5692 -0.1966
## 189  9.031                0.0001925 -0.0006260 -0.006066 0.4386        
## 509  7.420                0.0001804 -0.0006362 -0.005050 0.5823 -0.1808
## 255  6.900         0.1666 0.0001855 -0.0006646 -0.004965 0.4416 -0.2034
## 254  8.033 -0.2702        0.0001774 -0.0006339 -0.005242 0.5430 -0.2091
## 445  8.354                0.0001889 -0.0006240 -0.005691 0.5071        
## 181 10.010                0.0002013            -0.006832 0.4615        
## 501  8.388                0.0001874            -0.005824 0.6059 -0.1747
## 511  6.418         0.1859 0.0001850 -0.0006667 -0.004746 0.4677 -0.1800
##      mos_w2   shp_w2 df   logLik    BIC delta weight
## 253 -0.4961           7 -638.605 1330.8  0.00  0.659
## 245 -0.5242           6 -644.494 1334.9  4.12  0.084
## 189 -0.4672           6 -644.751 1335.5  4.63  0.065
## 509 -0.4662 -0.09393  8 -637.118 1335.5  4.68  0.063
## 255 -0.4500           8 -637.558 1336.4  5.56  0.041
## 254 -0.4889           8 -637.659 1336.6  5.77  0.037
## 445 -0.4383 -0.12620  7 -641.770 1337.1  6.33  0.028
## 181 -0.4964           5 -650.382 1339.1  8.24  0.011
## 501 -0.4945 -0.09293  7 -643.023 1339.7  8.84  0.008
## 511 -0.4129 -0.10150  9 -635.842 1340.6  9.79  0.005
## Models ranked by BIC(x)
# lets compare the top model from AIC and BIC
head(x1.bic, n = 1) ## only shows top 1 models fit with BIC
## Global model call: glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## ---
## Model selection table 
##     (Int)      DFHH       DFHA       El2 elk_w2  got_w2  mos_w2 df
## 253 7.783 0.0001814 -0.0006378 -0.005231 0.5472 -0.2022 -0.4961  7
##       logLik    BIC delta weight
## 253 -638.605 1330.8     0      1
## Models ranked by BIC(x)
head(x1, n = 1) ## only shows top 1 models fit with AIC
## Global model call: glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## ---
## Model selection table 
##     (Int)     cls der_w2      DFHH       DFHA       El2 elk_w2  got_w2
## 512 6.609 -0.3139 0.2058 0.0001813 -0.0006645 -0.004725 0.4493 -0.1873
##      mos_w2  shp_w2 df   logLik   AICc delta weight
## 512 -0.3994 -0.1032 10 -634.601 1289.3     0      1
## Models ranked by AICc(x)
## So AIC is overfitting here potentially. 

#get top models with BIC <2
top.models.bic<-get.models(x1.bic, subset=delta<2)
#top.models.bic ## note there is only 1 top model here using BIC

#model average covariate effects
##x.top.bic<-model.avg(top.models.bic) ## only 1 top model, so this doesnt work

## Lets run the 'top' model selected using BIC for next week
top.model.bic = glm(used ~ DistFromHighHumanAccess2 + DistFromHumanAccess2+Elevation2+elk_w2+goat_w2+moose_w2, data=wolfkde2,family=binomial(logit), na.action ="na.fail")
summary(top.model.bic)
## 
## Call:
## glm(formula = used ~ DistFromHighHumanAccess2 + DistFromHumanAccess2 + 
##     Elevation2 + elk_w2 + goat_w2 + moose_w2, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8576  -0.4580  -0.1602  -0.0385   3.2508  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               7.783e+00  1.180e+00   6.597 4.19e-11 ***
## DistFromHighHumanAccess2  1.814e-04  3.428e-05   5.292 1.21e-07 ***
## DistFromHumanAccess2     -6.378e-04  1.980e-04  -3.221 0.001278 ** 
## Elevation2               -5.231e-03  5.880e-04  -8.896  < 2e-16 ***
## elk_w2                    5.472e-01  1.013e-01   5.401 6.64e-08 ***
## goat_w2                  -2.022e-01  5.867e-02  -3.447 0.000568 ***
## moose_w2                 -4.961e-01  8.349e-02  -5.943 2.80e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1277.2  on 2111  degrees of freedom
## AIC: 1291.2
## 
## Number of Fisher Scoring iterations: 7
## compare to top AIC model
summary(top.forward)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + DistFromHighHumanAccess2 * closed, family = binomial(logit), 
##     data = wolfkde2, na.action = "na.fail")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.01010  -0.46000  -0.15835  -0.04175   3.12206  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      6.883e+00  1.375e+00   5.005 5.57e-07 ***
## deer_w2                          1.931e-01  1.181e-01   1.636 0.101885    
## elk_w2                           4.478e-01  1.256e-01   3.564 0.000365 ***
## moose_w2                        -4.165e-01  9.197e-02  -4.529 5.93e-06 ***
## sheep_w2                        -9.786e-02  5.500e-02  -1.779 0.075221 .  
## goat_w2                         -1.856e-01  6.021e-02  -3.082 0.002053 ** 
## Elevation2                      -4.762e-03  6.188e-04  -7.695 1.42e-14 ***
## DistFromHumanAccess2            -6.560e-04  1.984e-04  -3.305 0.000948 ***
## DistFromHighHumanAccess2         1.368e-04  5.302e-05   2.580 0.009883 ** 
## closed                          -4.357e-01  2.259e-01  -1.929 0.053765 .  
## DistFromHighHumanAccess2:closed  6.271e-05  5.518e-05   1.136 0.255759    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1267.9  on 2107  degrees of freedom
## AIC: 1289.9
## 
## Number of Fisher Scoring iterations: 7

Objective 7. Variable reduction using PCA - Principle Components Analysis

Note princomp is a base function of {stats4}
head(wolfkde2)
##   deer_w2 moose_w2 elk_w2 sheep_w2 goat_w2 wolf_w2 Elevation2
## 1       4        5      5        3       3       5   1766.146
## 2       4        4      4        1       3       4   1788.780
## 3       4        5      5        4       1       5   1765.100
## 4       4        5      5        4       1       5   1742.913
## 6       1        1      1        1       4       1   1778.360
## 7       4        5      5        4       1       5   1764.313
##   DistFromHumanAccess2 DistFromHighHumanAccess2 landcover16 EASTING
## 1             427.3962                9367.8168           8  580840
## 2             360.5043               10398.5999           2  580000
## 3             283.6648               10296.5167           2  579800
## 4             167.4134                6347.8193           2  583803
## 6             622.6257                 723.7941          13  588573
## 7             373.2101                9331.2403           2  580785
##   NORTHING     pack used usedFactor      habitatType        landcov.f
## 1  5724800 Red Deer    1          1            Shrub            Shrub
## 2  5724195 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 3  5724800 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 4  5725654 Red Deer    1          1 Moderate Conifer Moderate Conifer
## 6  5728804 Red Deer    1          1   Burn-Grassland             Burn
## 7  5724966 Red Deer    1          1 Moderate Conifer Moderate Conifer
##   closedConif modConif openConif decid regen mixed herb shrub water
## 1           0        0         0     0     0     0    0     1     0
## 2           0        1         0     0     0     0    0     0     0
## 3           0        1         0     0     0     0    0     0     0
## 4           0        1         0     0     0     0    0     0     0
## 6           0        0         0     0     0     0    0     0     0
## 7           0        1         0     0     0     0    0     0     0
##   rockIce burn alpineHerb alpineShrub alpine closed closedFactor
## 1       0    0          0           0      0      0            0
## 2       0    0          0           0      0      1            1
## 3       0    0          0           0      0      1            1
## 4       0    0          0           0      0      1            1
## 6       0    1          0           0      0      1            1
## 7       0    0          0           0      0      1            1
pcawolf <-princomp(na.omit(wolfkde2[1:9]), cor=TRUE)
summary(pcawolf)
## Importance of components:
##                          Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.243495 1.2678247 0.87957691 0.71704964 0.62623282
## Proportion of Variance 0.559252 0.1785977 0.08596173 0.05712891 0.04357417
## Cumulative Proportion  0.559252 0.7378498 0.82381149 0.88094040 0.92451457
##                            Comp.6     Comp.7    Comp.8      Comp.9
## Standard deviation     0.60241949 0.38531793 0.3395040 0.229622996
## Proportion of Variance 0.04032325 0.01649666 0.0128070 0.005858524
## Cumulative Proportion  0.96483782 0.98133448 0.9941415 1.000000000
loadings(pcawolf)
## 
## Loadings:
##                          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## deer_w2                  -0.406         0.142 -0.178 -0.231 -0.348  0.202
## moose_w2                 -0.370         0.122         0.132  0.881  0.140
## elk_w2                   -0.402 -0.207  0.188        -0.160 -0.109  0.129
## sheep_w2                        -0.680 -0.178         0.675 -0.133 -0.129
## goat_w2                   0.186 -0.570 -0.395        -0.639  0.226 -0.142
## wolf_w2                  -0.415 -0.185  0.136 -0.107 -0.169 -0.138  0.179
## Elevation2                0.408 -0.162                              0.895
## DistFromHumanAccess2      0.318         0.351 -0.853               -0.182
## DistFromHighHumanAccess2  0.233 -0.299  0.775  0.467               -0.140
##                          Comp.8 Comp.9
## deer_w2                   0.633  0.398
## moose_w2                  0.148       
## elk_w2                   -0.744  0.389
## sheep_w2                              
## goat_w2                               
## wolf_w2                         -0.826
## Elevation2                            
## DistFromHumanAccess2                  
## DistFromHighHumanAccess2  0.111       
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.111  0.111  0.111  0.111  0.111  0.111  0.111  0.111
## Cumulative Var  0.111  0.222  0.333  0.444  0.556  0.667  0.778  0.889
##                Comp.9
## SS loadings     1.000
## Proportion Var  0.111
## Cumulative Var  1.000
plot(pcawolf, type="lines")

biplot(pcawolf, xlim =c(-0.06, 0.04))

wolfkde2$Comp.1 <- -0.406*wolfkde2$deer_w2 - 0.370*wolfkde2$moose_w2 - 0.402*wolfkde2$elk_w2 +0.182*wolfkde2$goat_w2 - 0.415*wolfkde2$wolf_w2 + 0.408*wolfkde2$Elevation2 + 0.318*wolfkde2$DistFromHumanAccess2 + 0.233*wolfkde2$DistFromHighHumanAccess2

wolf_comp1 <- glm(used ~ Comp.1, family=binomial (logit), data=wolfkde2)
wolfkde2$fitted1 <- fitted(wolf_comp1)
hist(wolfkde2$fitted1)

plot(wolfkde2$fitted1, wolfkde2$Comp.1)

figPCA <- ggplot(wolfkde2, aes(x=Comp.1, y=used)) + stat_smooth(method="glm", method.args = list(family="binomial"))
x.axis = "-0.406*deer - 0.370*moose - 0.402*elk +0.182*goat - 0.415*wolf + 0.408*Elev + 0.318*DistHumans + 0.233*DistHighHumans"
figPCA2 <- figPCA + xlab(x.axis)
figPCA2

OBJECTIVE 8.0 - Caterpillar plots of coefficients
require(ggplot2)

# run logistic regression model
summary(full.model)
## 
## Call:
## glm(formula = used ~ deer_w2 + elk_w2 + moose_w2 + sheep_w2 + 
##     goat_w2 + Elevation2 + DistFromHumanAccess2 + DistFromHighHumanAccess2 + 
##     closed + closed * DistFromHighHumanAccess2, family = binomial(logit), 
##     data = wolfkde2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.01010  -0.46000  -0.15835  -0.04175   3.12206  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      6.883e+00  1.375e+00   5.005 5.57e-07 ***
## deer_w2                          1.931e-01  1.181e-01   1.636 0.101885    
## elk_w2                           4.478e-01  1.256e-01   3.564 0.000365 ***
## moose_w2                        -4.165e-01  9.197e-02  -4.529 5.93e-06 ***
## sheep_w2                        -9.786e-02  5.500e-02  -1.779 0.075221 .  
## goat_w2                         -1.856e-01  6.021e-02  -3.082 0.002053 ** 
## Elevation2                      -4.762e-03  6.188e-04  -7.695 1.42e-14 ***
## DistFromHumanAccess2            -6.560e-04  1.984e-04  -3.305 0.000948 ***
## DistFromHighHumanAccess2         1.368e-04  5.302e-05   2.580 0.009883 ** 
## closed                          -4.357e-01  2.259e-01  -1.929 0.053765 .  
## DistFromHighHumanAccess2:closed  6.271e-05  5.518e-05   1.136 0.255759    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2040.9  on 2117  degrees of freedom
## Residual deviance: 1267.9  on 2107  degrees of freedom
## AIC: 1289.9
## 
## Number of Fisher Scoring iterations: 7
B<-summary(full.model)$coefficient[1:length(summary(full.model)$coefficient[,1]),1]
#create margin of error (ME) for 95% CI
ME <- summary(full.model)$coefficient[1:length(summary(full.model)$coefficient[,1]),2]*1.96
lower<-B - ME
upper<-B + ME


# bundle into data frame
logisData<-data.frame(B, lower, upper, names(summary(full.model)$coefficient[,2]))
names(logisData) <- c("Coefficient", "lower.ci", "upper.ci", "Variable")
levels(logisData$Variable)[1] <- "Intercept"
logisData$Variable <- relevel(logisData$Variable, ref="Intercept")

pd <- position_dodge(0.6) # move them .05 to the left and right
x1<-ggplot(data=logisData, aes(x=Variable,y=Coefficient)) +
  geom_errorbar(data=logisData,aes(ymin=lower.ci, ymax=upper.ci), width=.4,position=pd,size=1) +
  geom_point(size=3, col="blue") 

p6<-x1+theme(axis.text.y = element_text(size=14, family="Times"),axis.text.x = element_text(size=14, family="Times"),text = element_text(size=16, family="Times"),axis.title.x=element_text(size=16, family="Times"),axis.title.y=element_text(size=16, family="Times",vjust=1))
p7<-p6+theme(axis.line.x = element_line(color="black", size = 0.25),
             axis.line.y = element_line(color="black", size = 0.25),legend.title=element_blank(),legend.text=element_text(size=16, family="Times"))+ylab("Estimate") + xlab("Coefficient")

p7

tiff("coefPlot.tiff", res=600, compression = "lzw", height=5, width=7, units="in")
p7
dev.off()
## quartz_off_screen 
##                 2
8.1 GGally package - or check out the cool GGally package
require(GGally)
ggcoef(full.model)

## super cool!